


Institutional Archive of the Naval Postgraduate School 





Calhoun: The NPS Institutional Archive 
DSpace Repository 


Theses and Dissertations 1. Thesis and Dissertation Collection, all items 


1976 


A programme for the tailored selection of 
response patterns. 


Jackson, Jeffrey Quentin 


Monterey, California. Naval Postgraduate School 
http://ndl.handle.net/10945/17911 
Copyright is reserved by the copyright owner. 


Downloaded from NPS Archive: Calhoun 


: Calhoun is the Naval Postgraduate School's public access digital repository for 
/ (8 D U DLEY research materials and institutional publications created by the NPS community. 
«ist : Calhoun is named for Professor of Mathematics Guy K. Calhoun, NPS's first 


NY KNOX appointed — and published -- scholarly author. 

; | LIBRARY Dudley Knox Library / Naval Postgraduate School 

411 Dyer Road / 1 University Circle 
Monterey, California USA 93943 





http://www.nps.edu/library 


A PROGRAMME FOR THE TAILORED SELECTION 
OF RESPONSE PATTERNS 


Jeffrey Quentin Jackson 


ceKAN 
GRADUATE STHUCY 
_ CALIF. 93940 


£Y KNUA 
NAVAL POST 
MONTEREY 








NAVAL POSTGRADUATE SCHOOL 


Monterey, California 





THESIS 


A PROGRAMME FOR THE TAILORED SELECTION 
Ore Reo ONSE PATTERNS 


by 
Jeffrey Quentin Jackson 


June 1976 


Thesis Advisor: R. A. Weitzman 





Approved for public release; distribution unlimited. 


1174014 





SECURITY CLASSIFICATION OF THIS PAGE (When Data Entered) 


REPORT DOCUMENTATION PAGE 


- REPORT NUMBER 2. GOVT ACCESSION NO! 3. RECIPIENT’S CATALOG NUMBER 


4. TITLE (and Subtitie) 
A Programme for the Tailored Setection of 


S. TYPE OF REPORT &@ PERIOCO COVEREO 
Master's Thesis; 


June 1976 
Response Patterns 6. PERFORMING ORG. REPORT NUMBER 






7. AUTHOR(e) 6. CONTRACT OR GRANT NUMBER(e) 


Jeffrey Quentin Jackson 


9. PERFORMING ORGANIZATION NAME ANO AOORESS 10. PROGRAM ELEMENT. PROJECT, TASK 
AREA & WORK UNIT NUMBERS 


Naval Postgraduate School 
Monterey, California 93940 


11. CONTROLLING OFFICE NAME ANO ADORESS 
sl) 976 
Naval Postgraduate School une 1976 
13. NUMBER OF PAGES 


Monterey, California 93940 34 


18. SECURITY CLASS. (of thie rdport) 
Unclassified 





. MONITORING AGENCY NAME @ AOCORESS(If different trom Controiling Office) 


Naval Postgraduate School 
Monterey, California 93940 





Se. OECLASSIFICATION/ COWNGRADING 
SCHEOULE 


16. OISTRIBUTION STATEMENT (of thie Report) 


Approved: for public release; distribution unlimited. 


17. DISTRIBUTION STATEMENT (of the ebetract entered in Biock 20, if different from Report) 


18. SUPPLEMENTARY NOTES 


19. KEY WORDS (Continue on reveree aide if necessary and identify by block number) 


Tailored pattern analysis 


20. ABSTRACT (Continue on reveree cide ii neceeeary and identify by bieck mumber) 


The use of computer techniques to evaluate data in an 
attempt to find useful predictors of various criteria is of 
continuing interest. The use of stepwise pattern analysis 
tO select predictors has shown promising results. This paper 
presents a refinement of this technique called TPAN, which 
allows the items selected to be "tailored" to the various 


D , Rage ]473  ~—s EOI TION OF 1 NOV 63 1S OBSOLETE 


‘Page 1) SECURITY CLASSIFICATION OF THIS PAGE (When Date Entered) 





ath enh gy SSS PS ea 
SECURITY CLASSIFICATION OF THIS PAGE(When Dera Entered: 





patterns of the previously selected items. This is followed 
by a discussion of the results obtained using TPAN to select 
a four-item pattern, from the responses to an advancement 


examination, that best predicts performance on the general 
classification test. 


DD. Form_ 1473 
_ Jan 4 
S/N 0102-014-6601 


mm a ng 
SECURITY CLASSIFICATION OF THIS PAGE(WHEn Data Entered) 





. 7 
m) 7 
7 7 _ 
_ 

- - : a 

- = 7 

7 7 : : 

- 7 
7 7 : a 
: ’ _ 
-_ ; i. 
- 7 7 ; ' 

: 7 ‘ 4 ‘qj 

yt 

7 5) “si 5 

a 
a Se 
a 
- > 
a . 
- 7 


2 FROGHAMME FOR THE TAILORED SELECTION OF RESPONSE PATTERNS 


by 


: Jeffrey Quentin Jackson 
Liecutenant-Commander, Canadian Forces 
Bachelor of Science (Engineering Fhysics) 


Submitted in partial fulfiliment of the 
reguirements for the degree cf 


MASTER OF SCIENCE IN MANAGEMENT 


from the 
NAVAL POSTGRADUATE SCHOOL 
June 1976 





DUDLEY KNOX Lipeapy 
NAVAL Poster, 
M . 


‘alte d ole 


Vf 


ABSTRACT 


The use of computer technigues to evaluate data in 
an attempt to find useful predictors of varicus 
criteria is of continuing interest. The use of 
stepwise pattern analysis to select predictors has 
shown prepising results. This paper presents a 
refinement of this technigue called TPAN, which allcws 
the items selected to be "tailored" tc the varicus 
patterns cf the previously selected items. This is 
followed ty a discussion of the results crtained using 
TPAN tc select a four-item pattern, from the responses 
to an advancezent examination, that best predicts 


performance on the General Classificaticn Test. 
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I. INTRODUCTION 


The increasing complexity of modern society has spawned 
a concurrent proliferation of specialized tasks which feople 
are reguired to perforn. As the training and _ skill 
necessary to carry out these tasks has increased, there has 
arisen a desire to select only those most likely to succeed 
tec underge such training and ferform such tasks. Thus there 
is considerakle interest in selection procedures and methods 
of prediction cf success. 


In the quest for better and better selectors for more 
and more specialized criteria, the ccmplexity of testing 
procedures haS grown. However, because of the costs of 
designing and administering large tests, methods are being 
sought tc increase the validity of prediction from ever 
smaller sets cf test items. The use of large digital 
ccmputers and impreved statistical technigues has aided this 


cause considerably. 


One such technique to impreve the predictive validity of 
a set of test items is called pattern analysis. Here, 
Father than aggregating the number of fright and wrong 
answers into a single score, the pattern of right and wrong 
answers to the individual questions is analysed. The 
theoretical Lasis of this method is discussed by Lubin and 
OsEcrne in Reference 1, and Weitzman fresents a summary of 
work cn it thrceough 1973 in Reference 2. 


Folce 3) has develonved pattern analysis into a 
computerized stepwise technigue for selecting a sukset of 


best predictors from a larger set of items. This programme 





is called PAIN. The results of this programme compare very 
favourably tc the use of aggregate sccres. It is the 
intention of this paper to present a refinement cf this 
procedure which would allow "tailicring" of the items 
selected so that the best item is selected fer each pattern 


of responses tc the previously selected items. 





A. GENEBAL 


Stepwise pattern analysis 1S a teéechnigue employed to 
select, frces a set of binary items, a small subset that is 
the best predictor fer some criterion. These binary items 
may reflect ccrrect or incorrect responses tc test questions 
Or indicate whether or not the subject is included in a 
demcgraphic group, e.g., black cr not black, age between 25 
and 30 years cr not. The criterion can also Ee binary, such 
aS success cr failure in training, or it may be continuous, 
e.g., final examination score. 


Whether the criterion is continuous or binary, the 
process of item selection is the same. An item is selected 
and a fattérn score iS computed for all pessible patterns 
using that item and previously selected items. The fpattern 
score is cktained ky computing the mean score cn the 
criterion fcr all subjects having that pattern. FOr 
exagple, cn the first item, the scores of all fersons having 
an incorrect (zero) response on that item are averaged to 
give the zero pattern score, and the same for all fersons 
having a correct (one) response. For the second item there 
are four pessible patterns: correct on toth items (11), 
incorrect cn Ecth items (00), correct on item one and 
incorrect cn item two (10), and incorrect on item one and 
ccrrect or item two (01). In all cases, the mean criterion 
score of tke subjects in each category is assigned as that 
Fattern score. 





After the pattern scores are determined, each subject is 
assigned the pattern score appropriate to his pattern. The 
correlaticn between the subjects' pattern scores and their 
actual sccres is then calculated. This calculaticn is 
repeated for each item in the set, and the item having the 
highest ccrrelation coefficient is selected as the best item 
to be added to the subset. 


Using this method a great deal of information can be 
oktained frem relatively meager data. For instance, Folce 
was able to select only seven of the 70 items in the 
Electronics Technician Selection Test and oktain a 
correlaticn cf better than 0.8 between the pattern score and 
the final grade assignment at the Electrcnics Technician 
Schcol at San Diego, California. However, it shculd be 
possible to get even more information from the same sized 
subset by allowing different items to be selected for 
different subsets of the sample. That is, having selected 
the first item, the sample can te divided into two groups, 
these sccrirg a one on that item and those scoring a zero. 
It 1s quite possible that the next best predictor may be 
different fcr each of these groups, and different frem the 
best predictcr fer the gruop as a whole. While PAIN selects 
the next item based on the whole group, tailored pattern 
analysis would allcw a different item to be selected for 
€ach sukgrcrep. A computer programme called TPAN has been 
developed to select such a tailored pattern cf four items. 


B. TPAN, A TAILCRED PATTERN SELECTOR 


TPAN is an ALGOL programme which will select a four-iten 
pattern with the highest correlation between the pattern 


score for each individual and his actual score. 
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The programme first reads one card which must ccntain 
the numker cf binary test items fcr each subject in the 
Sample (NIIMS) and the number of subjects in the sample 
(NIS). Tke number of items is then passed to a FCRTRAN 
sukroutine called INPTTR to read in the data. This 
sukroutine reads the complete data for one subject and 
passes back, to the main programme, the criterion score and 
an integer array of ones and zeros which are the item 
responses for that subject. Also, if it has reached the end 
of the file, the subroutine returns the actual numtker of 
records it has read so that the number of subjects (NIS) can 
be updated. To reduce the amount of memory required Ey the 
pregramme, TFEAN compresses the data in the response array so 
that the responses to 32 items are contained in one word. 
{Twc new arrays are then formed, each having one entry for 
€ach Sukiect in the sample. One array contains the 
criterion sccres and the other the item responses. Each 
entry in tke latter uSes aS Many words as are required to 


certain the responses to all the items. 


Most of the work is done by the subroutine BITPICKER. 
This routine, having been passed the array of scores and 
responses, selects the item from the respcenses that has the 
hichest correlaticn between pattern scores and actual 


SCOLTES. 


The subject's response to a particular ites is 
determined Ey placing a one in a mask only in the bit 
correspondince to the item under consideration. A lcegical 
"and" operation is then performed with the word containing 
the subject's response. Only if his response to that one 
item was acne will the result of the "and" operation be 
other thar zero. In such caSe his score will be added to 
the sum fer the "one" responses and the number cf "one! 
responses will be incremented. If a zero results from the 


Nand", tke changes will be made to the "zero" response data. 
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A runnine tctal is also kept of the sum of the squares of 
the scores cf each individual. 


When the responses of all the subjects have been checked 
the mean score for the zero and one responses is calculated, 
giving the pattern scores. These, along with the sum of the 
criterion sccres and the sum of the squares cf the criterion 
sceres, are enough data to calculate the correlation 
coefficient. The computing formula for the Pearson 


preduct-mcment correlation coefficient is used: 


_ NZ (cs) (ps) - (cs) (E ps) 


1/2 
[Hg ps? ~ (Zps)2][yFcs? - (£cs)2}} ” 





This precedure of obtaining pattern scores anc then 
calculating the correlation coefficient is repeated for each 
item in the set. The item having the highest correlation 
coefficient is selected. 

To facilitate the use of this rcutine for iterations 
wren it is désired only to use a subset of subjects whe had 
a particular pattern, a pointing vector is used rather than 
directly uwsirg the arrays of scores and resfpenses. That iss: 
the subrcutire BITPICKER is always passed the total array of 
responses and scores. It is also passed another array 
containince the positions in the main array of all sukjects 
whe are tc re used in the calculation. {his is the _ so 
called pcinting vector. 

For example, to select the first item the pointing 
vector ccntains all the integers up to and including the 
total numrer cf sukjects in the sample. Thus when the 
subroutine ckecks each subject whose number is in the 
pointing vectcr, it checks the whole set. However, as the 
data for each subject is checked, his positicn number is put 
into cne cf two vectors depending on whether the response to 
that item is aocneora zero. These two vectors, fcr the 
item with the highest correlation coefficient, are fassed 


Fack to tke fain programme. 
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When the second item 1s to be picked, BITPICKER is first 
passed the pointing vector to those subjects having a zero 
response to the first item. The subroutine will then pick 
the item having the highest correlaticn only for those 
sukjects having the zero response. The subroutine is then 
called again but with the pointing vector for the one 
responses. Thus, a different second item may be picked for 
this sukgrourp. In each case two new pointing vectors are 
passed kack to the main programme, pointing tc those 
sukjects having a zero and those having a one response to 
the chosen seccnd items. 

The main body of the programme is, therefore, a series 
of calis to the subroutine BITFICKER passing it the arrays 
of scores (SCORE) and responses (RESP), the approfriate 
pcinting vectcr (PTR for the first item), and the number of 
entries in that vector (NIS). The subroutine returns two 
new pointing vectors (PTRO and PTR1 for the first item), as 
well as the correlation coefficient (R), item number (ITM), 
totals of ones (TOT1) and zeros (T0T0), and the pattern 
scores fcr ones (MPS1) and zeros (MESO). There are also 
masks passed tack and forth to indicate which items have 
already been chosen (MASKIN and MASK) and standard 
acccunting data of the total number of items (NITMS) and the 
number cf werds required to hold all the items at 32 items 
per word (NSEGS). 

Cn tke second call, the best item for those sukjects 
having the zero response to item one is desired. Therefore, 
the data passed are the pointing vector PTRO and its length 
TCTO. The data returned are: correlation coefficient R0, 
item nuster ITM0, and the pointing vectors, totals and 
pattern scores, PTRO1, TOTO1, MPSO1, PTROO, TOTOO, and 
MESOO. 

The final result of TPAN is a set cf 16 fatterns 
described by the kinary numbers 0000 threugh 1111. Each 
binary digit represents the response to one of the four 
items selected. The first item will be the same fcr all 
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patterns. {kere may be two different second items (cne for 
€ach response tc item one), four third items and eight 
fourth items. The final step in the programme is to 
calculate an overall correlation coefficient. Each subject 
is assigned the pattern score appropriate tc his reésfonses 
on the selected items. The correlation coefficient is then 
calculated using the same algorithm as for the individual 
itegs. 

An additicnal facility provided by TPAN is the akility 
to set bounds on the criterion scores which it is desired to 
use. This is done by including two more numbers cn the 
Single ALGOL input data card; these are the upper limit of 
the desired scores and the lower limit. As the data records 
are read, €ach sccre is checked against these bounds and if 
it is outside the limits that record is rejected. The number 
cf the reccrd is printed out, as well as the score on which 
it was rejected. After all the records have been read, the 
numter in the sample is revised to ailow fcr the reected 
records. 

A ccmplete listing of the ALGOL programme is ccntained 
in appendix A. 


13 





III. AN APPLICATION OF TEAN 


In crder to test the programme, TPAN was run wSing as 
data the restlts of an advancement examinaticn to pay grade 
7 for kEciler technicians. The source data contained the 
results fcr approximately 1100 enlisted men for the 150 
items on this examination. From these responses plus an 
additional item indicating whether the race of the 
individual was black, TPAN was to select the four best items 
to predict tke subject's score on the General Classification 
Test (GCT). 

A valid range of 1 to 99 was set for the GCT sccres and 
a number cf reccrds were outside this range (the field 
contained eéither a zero or non-numeric data). TPAN 
eliminated these records and the final sample contained 1024 
subjects. The results obtained from this run are given in 
table 1. 

The value of the correlation coefficients given in the 
takle are thcse used in selection of the items and, hence, 
represent the correlation only within the subset of sukjects 
having tke pattern shown for the previously selected items. 
It will te noted that these correlations are ali father 
small, canging from 0.16 to 0.50. This is to be expected, 
however, as the advancement examination is net intended to 
measure the same qualities as the GCT. This is further 
borne out by the fact that the first item chcsen, that 1s, 
the single best indicator of performance on the GCT among 
the items ccrnsidered, was item 1, race. 
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TABLE 1 


SELECZTICN OF ITEMS FROM ADVANCEMENT EXAMINATION AS 
PREDICTORS FOR GCT 


pattern 
0,1 

00,C1 
600,001 
00C0,000 1 
0010,0011 


010,011 
0100,0101 
0110,0111 


10,11 
10€,101 
10€0, 1001 
1010,1011 


110,111 
171€0,1101 
1110,1111 


Genrel— 

aticn items 
~z47 1 

24% «1,24 

-z00 1,24,7 

2-217) «1,24,7,104 
weno 1,24,7,23 
2218 1,24,104 
~-164 1,24,104,93 
0222) «61,24, 104, 86 
-cH8 1,7 

~362 1,7,104 
moe) 1,7, 104,23 
mie) 64g? 104,13 
esd) 1,7, 120 
~-450 1,7,120,101 
mean, 61g 7, 120,55 


number 
of Q's 


15 


896 
Bee, 
V2 
110 
124 


Zar 


153 
124 


os 
32 
27 
4 


27 
14 
16 


hnean 


. SCOTE 


47.60 
45.56 
44.08 
42.75 
45.48 


47.39 
46.65 
48.80 


38.45 
36.09 
34.67 
50.00 


4Q.89 
43.71 
48.81 


numoec 
of 1's 


128 
497 
207 
82 
83 


230 


75 
21 


17 


42 
13 
32 


bean 

score 
41.73 
19223 
46.93 
45.87 
49.10 


£0.66 
49.16 
526 15 


44.05 
42.05 
43.80 
40.17 


45.83 
37.85 
44.34 





Even given these less than ideal circumstances, the 
cverall ccrrelation for the four-item patterns was 0.47. 
This compares favourably with the figure cf 0.40 .cbhtained 
fer four items selected by PAIN. Moreover, one cf the 
disadvantages of PAIN is the amount of time and ccrputer 
memory reguired to run it. For the 1100 subject sample FAIN 
reguired 40€,000 bytes of memory and 4 minutes to run. TPAN 
on the other hand reguired only 180,000 bytes and fran in 
Slightly over 3 minutes. This is partly because of the fact 
that cnly twce patterns are assessed on each iteration and 
partly due to the more efficient handling cf the algcrithnm 
allowed Ey AIGCL. 
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IV. CONCLUSIONS AND RECOMMENDATIONS 


The results cf the study indicate that tailoring the 
item selecticn in fattern analysis enables more inforgation 
to be extracted from a four-item pattern than if straight 
stepwise selection based on the whole group is used. 
However, the actual advantage gained in terms of the amount 
of predicaticn per test item is guestionable. There are, in 
fact, eight sets of four test items using a total of up to 
15 different items. Therefore, if TPAN were to be used to 
select items to be included in a minimum length test, its 
performance sculd have to be compared to PAIN selecting a 15 
item subset. Cn the other hand, if it is desired to lcck at 
existing cata in an attempt tc predict some output, MTPAN 
shculd presert a distinct advantage. 

There are several areas where TPAN could be impreved and 
extended. Tte first is the data printed out. AS mentioned, 
the correlaticn ccefficients that are given are those within 
the subset used to pick the next item. More useful values 
would be tke’ overall correlations at the end cf the 
selection of all second, third, and fourth items. The final 
one 1s the crly cne calculated at present. To accogplish 
this weuld require only the the accumulaticn of one cr two 
more items of data, which are already available, and two 
additicnal ccrrelation calculations. 

Another shcertccming of the programme is itsS response 
when it reaches a point of indifference to all items, i.e., 
the correlaticn coefficients for all items 1S Zero. At 
present, in this situaticn, the programme prints an 
obvicusly errcneous item number (-32), and sets all cf the 
Statistics (mean scores and totals cf zero and one 
Lesponses) tc 2€LoO. This action will disrupt the 


Calculaticn cf£ the overall correlation coefficient. The most 
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reasonable ccrrective action in this case would ke to 
terminate selection of items and, when calcutating the 
overall ccrrelaticn coefficient, use the pattern and pattern 
sccres derived for the last good iten. 

Increasing the number of items in the pattern presents 
no progragmwing problems. [It is simply necessary to add more 
calis te tke subroutine BITPICKER, passing ate the 
appropriate fointing vector. The problems encountered are 
statistical. The numbers of patterns and pessible different 
items doukles with each addition cf one item to the pattern. 
With 1100 sukjects in the sample, there are already some 
sukgreups cf less than 20 subjects. The validity cf items 
selected cn the kasis of such small samples is questionable. 

A final and very interesting area for increasing the 
sccpe of the frograsgme would be to include scme ability to 
Manipulate ccntinuous data as well as binary items. The 
prcgramme cculd be changed to determine the correlation 
between any fair of continuous attributes of the subgroups 
having pattern responses selected by TPAN. All that weuld 
be reguired would be to read in an array cr arrays cf the 
values of the ccntinuously variable data for each sutject. 
Then, after é€ach item was selected, the fointing vector 
preduced ky the BITPICKER subroutine could be used to select 
the apprcgpriate sukjects! data from the arrays of continuous 
variables. Each correlation coefficient thus derived would 
be for a subgroup having a particular pattern. Such a 
routine cculd ke used to determine for which of several 
subgroups, having different patterns of responses, the 
correlaticn was highest. Such a programme cculd aisc answer 
other interesting questions. For example, if we select a 
subgroup having a pattern with high correlation between 
pattern and actual scores, how does it affect the 
correlaticn Eetween an independent continuous variakle and 


the same criterion sccre? 
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APPENDIX A 


LISTING OF ALGOL PROGRAMME TFAN 


The fcllcwing pages contain a listing of the source file 
of the ALGCI programme. The first two columns would not be 
part of an input deck, but are included to facilitate 
reading the programme. A number in the first cclumn 
indicates when a block of code starts. The same number in 
the next ccltmn indicates the end of that blcck. 
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APPENDIX B 


LISTING OF FORTRAN INPUT ROUTINE 


The follcwing listing is the FORTRAN input routine used 
with data supfiied on the advancement examination. The file 
was 160 characters iong, with race in colunn 6 followed by 
the respcenses cn the 150 guestions. The GCT score was in 
cclumns 157 and 158. The data was read into an integer array 


and passed tack to the main programme. 
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